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(57] ABSTRACT 

A processing element constituting the basic building 
block of a massively-parallel processor. Fundamentally, 
the processing element includes an arithmetic sub-unit 
comprising registers for operands, a sum-bit register, a 
carry-bit register, a shift register of selectively variable 
length, and a full adder. A logic network is included 
with each processing element for performing the basic 
Boolean logic functions between two bits of data. There 
is also included a multiplexer for intercommunicating 
with neighboring processing elements and a register for 
receiving data from and transferring data to neighbor- 
ing processing elements. Each such processing element 
includes its own random access memory which commu- 
nicates with the arithmetic sub-unit and the logic net- 
work of the processing element. 

15 Claims, 8 Drawing Figures 
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PROCESSING ELEMENT FOR PARALLEL ARRAY performing all of the Boolean operations possible be- 

PROCESSORS tween two bits of data, and that each such processing 

BACKGROUND OF THE INVENTION element include its own random access memory. Yet 

. . - further, for such a system to be efficient, it should in- 

The mvention described herein was made in the 5 elude means for bypassing inoperative or malfunction- 

MirsTio, "^7°^^"""^ NASA Contract No. i„g processing elements without diminishing system 

N AS ."j-25392 and i.s subject to the provisions of § 305 integrity 

of the National Aeronautics and Space Act of 1958(72 

Stat. 4.35; 42 U.S.C. 2457). OBJECTS OF THE INVENTION 

The instant invention resides in the art of data proces- In light of the foregoing, it is an object of an aspect of 

sors and, more particularly, with large scale parallel the invention to provide a plurality of processing ele- 

processors capable of handling large volumes of data in ments for a parallel array processor wherein each such 

a rapid and cost-effective manner. Presently, the de- element includes a variable length shift register for at 

mands on data processors are such that large pluralities least assisting in arithmetic computations, 

of data must be arithmetically and logically processed in 15 yet another object of an aspect of the invention is to 

short periods of tune for purposes of constantly updat- p^vide a plurality of processing elements for a parallel 
^L-'^l^nT l^^Tn'^f '''"'"k- array processor wherein each such processing element 

jiran-i i^w^stSti^L^^^^^^^^ LX'rpis:=iets;^ 

=^'si;^.Th!7hLT^^^^^ - "'sC=^^ 

lOf 3 bits per day. For such an imaging system, a variety * plurality of processmg elements for a parallel 

of image processing tasks such as geometric correction, processor wherein each such processing element 

correlation, image registration, feature selection, multi- capable of performmg bil-senal mathematical compu- 

spectral classification, and area measurement are re- 25 ^^^'0"^- 

quired to extract useful information from the mass of additional object of an aspect of the invention is to 

data obtained. Indeed, it is expected that the work load provide a plurality of processing elements for a parallel 

for a data processing system utilized in association with f^^y processor wherein each such processing element 

such orbiting image sensors would fall somewhere be- ^ capable of performing all of the Boolean functions 

tween 10^ and 10*0 operations second. 30 capable of being performed between two bits of binary 

High speed processing systems and sophisticated data, 
parallel processors, capable of simultaneously operating Yet a further object of an aspect of the invention is to 
on a plurality of data, have been known for a number of provide a plurality of processing elements for a parallel 
years. Indeed, applicant's prior U.S. Pat. Nos. array processor wherein each such processing element 
3,800,289; 3,812,467; and 3.936.806, all relate to a struc- 35 includes its own memory and data bus. 
ture for vastly increasing the data processing capabihty Still a further object of an aspect of the invention is to 
of digital computers. SimUarly, U.S. Pat. No. 3,863,233. provide a plurality of processing elements for a parallel 
assigned to Goodyear Aerospace Corporation, the as- array processor wherein certain of said processing de- 
signee of the mstant application, relates specifically to a ments may be bypassed should they be found to be 
data processing element for an associative or parallel 40 inoperative or malfunctioning, such bypassing not di- 
processor which also increases data processing speed by finishing the system integrity. 

wnrH !n^.h ^ ^ anthmetic units, one for each yet another object of an aspect of the invention is to 

Tdvlo^J^^rrnrr" "^T""?- ^""^ P"'^*^'*^ « P^^^'y of processbg elements for a parallel 
advancements of these prior art teachings do not dos- L-tt.- ^ . 

sess the capability of cos' effectively handling the C 45 Tnf ^^."^h ""rt M ^^^''-^^^'^^P^^^^' 

volume of data previously described, A system of the ^"^ofalargeplurality of data ma time^fTicient manner, 

required nature includes thousands of processing ele- SUMMARY OF THE INVENTION 

ments, each including its own arithmetic and logic net- tt. r • ^ , 

work operating in conjunction with its own memory, . ^^'^^t^omg and other objects of aspects of the 
while possessing the capability of communicating with 50 ^^^'^'^^^ ^y * » plurality of 
other similar processing elements within the system. Pr^cessmg elements mterconnected with each other 
With thousands of such processing elements operating wherem each such processing element comprises; a 
simuluneously (massive-parallelism), the requisite n^emory; an adder; and communication means con- 
speed may be achieved. Further, the fact that typical neighboring processing elements within said 
satellite images include millions of picture elements or 55 "matrix and further connected to said adder and memory 
pixels that can generally be processed at the same time, transferring data between said memory, adder, and 
such a structure lends itself well to the solution of the neighboring processing elemenU. 

aforementioned problem. DESCRIPTION OF DRAWINGS 

In a system capable of processing a large volume of uc^^Kiri luiN uh UKA WINGS 

data in a massively-parallel manner, it is most desirable 60 * complete understanding of the objects, tech- 

that the system be capable of performing bit-serial niques, and structure of the invention, reference should 

mathematics for cost effectiveness. However, in order ^ ^o the following detailed description and accom- 

to increase speed in the bit-several computation, it is panying drawings wherein: 

most desirable that a variable length shift register be FIG. 1 is a block diagram of a massively-parallel 
included such that various word lengths may be accom- 65 processing system according to the invention, showing 
modated. Further, it is desirable that the massive array the interconnection of the array unit incorporating a 
of processing elements be capable of intercommunica- plurality of processing elements; 
tion such that data may be moved between and among FIG. 2 is a block diagram of a single processing cle- 
at least neighboring processing elements. Further, it is ment, comprising the basic building block of the array 
desirable that each processing element be capable of unit of FIG. 1; 
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FIG. 3» consisting of FIGS. 3A-3C, constitutes a external computer interface. As is well known in the art, 

circuit schematic of the control signal generating cir- the unit 18 may also include peripheral equipment such 

cuitry of the processing elements maintained upon a as magnetic tape drive 28, disks 30, a line printer 32, and 

chip and including the sum-or and parity trees; an alphanumeric terminal 34. 

FIG. 4 is a detailed circuit schematic of the funda- 5 While the structure of FIG. 1 is of some significance 

mental circuitry of a processing element of the inven- for an appreciation of the overall system incorporating 

tion; the invention, it is to be understood that the details 

FIG.5,comprisingFIGS. 5A and 5B, presents circuit thereof arc not necessary for an appreciation of the 

schematics of the switching circuitry utilized in remov- scope and breadth of applicant's inventive concept, 

ing an inoperative or malfunctioning processing ele- 10 Suffice it to say at this time that the array unit 12 com- 

ment from the array unit. prises the inventive concept to be described in detail 

DETAILED DESCRIPTION OF PREFERRED ^^^^^^ ^^^^ ^^^^y includes a large plurality of 

EMBODIMENT interconnected processing elements, each of which has 

, ^ , , its own local memory, is capable of performing arithme- 

Referring now to the drawing and more particulariy 15 computations, is capable of performing a full com- 

FIG. 1, It can be seen that a massively-parallel processor pje^^ent of Boolean functions, and is further capable of 

IS designated generally by the numeral 10. A key ele- communicating with at least the processing elements 

ment of the processor 10 is the array unit 12 which, in a orthogonally neighboring it on each side, hereinafter 

preferred embodiment of the invention, includes a ma. referenced as north, south, east, and west. 

tnx of 128X128 processing elements, for a total of 20 with specific reference now to FIG. 2, it can be seen 

16,384 proc^mg elements to be descnbed in detail ^^^^ ^ ,^ processing element is designated generally 

heremafter The array umt 2 inputs da^ on its lef^ side ^ ^^/^^ 3^ processing element itself in- 

andoutputsdauonitsnghtsideo^^^^ 128 parallel lines. J^^^ ^ p j^^^^ 3^ .^^ 

The maximum transfer rate of 126-bit columns of data IS ^^ -^An ^ u 1 J *• r r *u 

u r w J '^41. 1 10 u if w* logic 40, performs all logic and routing functions for the 

10 mhz for a maximum bandwidth of 1.28 billion bits per 25 * • i « a n •* a-* 

second. Input, output, or both, can occur simulte- ^ "5 f^'?^ V ' ^^^^^^ 

neously wiSi proceiing. ^'.'''^!u^^^n"^^^ ^u"^^- 1^' associated 

Electronic switches 24 select the input of the array ^f^^ ^"IPf^i^*^ anthmetic umt of 

unit 12 from the I28.bit interface of the processor 10. or processmg element 36 The G register 52 is pro- 

from the input register 16. SimUarly, the array 12 output 30 ^'^^^ !^ ^""^'^l of both anthmetic and logical 

may be steered to the 128.bit output interface of the operations, while the S register 54 is used to shift data 

processor 10 or to the output register 14 via switches 26. '"'^ ^ =^ processing element 36 without dis- 

These switches 24,26 are controlled by the program and ^"'^'"8 operations thereof. Fmally, the aforementioned 

dau management unit 18 under suitable program con- elements of the processing element 36 are connected to 

trol. Control signals to the array unit 12 and status bits $5 » uniquely associated random access memory 56 by 

from the array unit may be connected to the external "^^ans of a bi-directional data bus 58. 

control interface of the processor 10 or to the array presently designed, the processmg element 36 is 

control unit 20. Again, this transfer is achieved by elec- reduced by large scale integration to such a size that a 

tronic switches 22. which are under program control of single chip may include eight such processing elements 

the unit 18. 40 along with a parity tree, a sum-or circuit, and associated 

The array control unit 20 broadcasts control signals control decode. In the preferred embodiment of the 
and memory addresses to all processing elements of the invention, the eight processing elements on a chip are 
array unit 12 and receives status bits therefrom. It is provided in a two row by four column arrangement, 
designed to perform bookkeeping operations such as Since the size of random access memories presently 
address calculation, loop control, branching, subroutine 45 available through large scale integration is rapidly 
calling, and the like. It operates simultaneously with the changing, it is preferred that the memory 56, while 
processing element control such that full processing comprising a portion of the processing element 36, be 
power of the processing elements of the array unit 12 maintained separate from the integrated circuitry of the 
can be applied to the data to be handled. The control remaining structure of the processing elements such 
unit 20 includes three separate control units; the pro- 50 that, when technology allows, larger memories may be 
ccssing clement control unit executes micro-coded vec- incorporated with the processing elements without 
tor processing routines and controls the processing altering the total system design, 
elements and their associated memories; the input/out- The data bus 58 is the main data path for the process- 
put control unit controls the shifting of data through the ing element 36. During each machine cycle it can trans- 
array unit 12; and the main control unit executes the 55 fer one bit of data from any one of six sources to one or 
application programs, performs the scaler processing more destinations. The sources include a bit read from 
internally, and makes calls to the processing element the addressed location in the random access memory 56, 
control unit for all vector processing. the state of the B, C, P, or S registers, or the state of the 

The program and data management unit 18 manages equivalence function generated by the element 60 and 

data flow between the units of the processor 10, loads 60 indicating the state of equivalence existing between the 

programs into the control unit 20, executes system tests outputs of the P and G registers. The equivalence func- 

and diagnostic routines, and provides program develop- tion is used as a source during a masked-negate opera- 

ment facilities. The details of such structure are not tion. 

important for an understanding of the instant invention. The destinations of a data bit on the data bus 58 are 

but it should be noted that the unit 18 may readily com- 65 the addressed location of the random access memory 

prise a mini-computer such as the Digital Equipment 56, the A, G, or S registers, the logic associated with the 

Corporation (DEC) PDP-l 1/34 with interfaces to the P register, the input to the sum-or tree, and the input to 

control unit 20, array unit 12 (registers 14,16), and the the parity tree. 
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Before considering the detailed circuitry of the pro- 
cessing element 36, attention should be given to FIG. 3 
wherein the circuitry 62 for generating the control 
signals for operating the processing elements is shown. 
The circuitry of FIG. 3 is included in a large scale 
integrated chip which includes eight processing ele- 
ments, and is responsible for controlling those associ- 
ated elements. Fundamentally, the circuitry of FIG. 3 
includes decode logic receiving control signals on lines 
LO- l.F under program control and converts those sig- 
nals into the control signals K1-K27 for application to 
the processing elements 36, sum-or tree, and parity tree. 
Additionally, the circuitry of FIG. 3 generates from the 
main clock of the system all other clock pulses neces- 
sary for control of the processing element 36. 

One skilled in the art may readily deduce from the 
circuitry of FIG. 3 the relationship between the pro- 
grammed input function on the lines LO-LF and the 
control signals K1-K27. For example, the inverters 
64,66 result in K1 = LC Similarly, inverter 68-72 and 
NAND gate 74 result in K16 = L0.L1. By the same 
token, K18 = L2.L3-L4.L6. 

Clock pulses for controlling the processing elements 
36 are generated in substantially the same manner as the 
control signals. The same would be readily apparent to 23 
those skilled in the art from a review of the circuitry 62 
of FIG. 3. For example, the clock S-CLK=S-CLK- 
ENABLE-MAIN CLK by virtue of inverters 76,78 and 
NAND gate 80. Similarly, clock G-CLK = L8.MAIN 
CLK by virtue of inverters 76,78 and NAND gate 84. 

With further respect to the circuitry 62 of FIG. 3, it 
can be seen that there is provided means for determin- 
ing parity error and the sum-or of the data on the data 
bus of all processing elements. The data bit on the data 
bus may be presented to the sum-or tree, which is a tree 
of inclusive-or logic elements which forms the inclu- 
sive-or of all processing element data bus states and 
presents the results to the array control unit 20. 
In order to detect the presence of processing elements 



10 



15 



20 



30 



35 



sum-or outputs are transferred via the same gating ma- 
trix 92. which is controlled by K27 to determine 
whether parity or sum-or will be Iransferred from the 
chip to the array control unit 20. Tlie outputs of the 
flip-flops 86 of each of the processing elements are con- 
nected to the 2048 input sum-or tree such that the pres- 
ence of any set flip-flop 86 might be sensed. By using a 
flip-flop which latches upon an error, the array control 
unit 20 can sequentially disable columns of proces.sing 
elements until that column containing the faulty ele- 
ment is found. 

Finally, and as will be discussed further hereinafter, 
control signal K25 is used to disable the parity and 
sum-or outputs from the chip when the chip is disabled 
and no longer used in the system. 

While the utilization of sum-or and parity functions 
are known in the art, their utilization in the in.stani in- 
vention is important to assist in locating faulty process- 
ing elements such that those elements may be removed 
from the operative system. The trees 88.90, mutually 
exclusively gated via the network 92, provide the capa- 
bility for columns of processing elements 36 to be 
checked for parity and further provides the sum-or 
network to determine the presence of processing ele- 
ments in particular logic states, such as to determine the 
responder to a search operation. The number of circuit 
elements necessary for this technique have been kept to 
a minimum by utilizing a single output for the two trees, 
with that output being multiplexed under program con- 
trol. 

With final attention to FIG. 3, it can be seen that the 
disable signal, utilized for removing an entire column of 
processing element chips from the array unit 12. gener- 
ates the signal K25,K26 for this purpose. As mentioned 
above, the control signal K25 disables the sum-or and 
parity outputs for associated processing elements. Fur- 
ther functions of the signals K25,K26 with respect to 
removing selected processing elements will be dis- 



1 certain states, groups of eight processing elements are 40 cussed with respect to FIG. 5 hereinafter. 



45 



50 



ORed together in an eight input sum-or tree whose 
output is then fed to a 2048-input or-tree external to the 
chip to achieve a sum-or of all 16,384 processing ele- 
ments. 

Errors in the random access memory 56 may be de- 
termined in standard fashion by parity-generation and 
checking circuitry. With each group of eight processing 
elements 36 there is a parity-error flip-flop 86 which is 
set to a logic I whenever a parity error is detected in an 
associated random access memory 56. As shown in the 
circuitry 62, the sum-or tree comprises the three gates 
designated by the numeral 88 while the parity error tree 
consists of the seven exclusive -OR gates designated by 
the numeral 90. During read operations, the parity out- 
put is latched in the flip-flop 86 at the end of the cycle 55 
by the M-clock. During write operations, parity is out- 
putted to a parity memory through the parity-bit pin of 
the chip. The parity memory comprises a ninth random 
access memory similar to the elements 56. The parity 
state stored at the parity bit during write operations is 60 
exclusive -ORed with the output of the parity tree 90 
during read operations to affect the latch 86. 

As shown, control signal K23 determines whether a 
read or write operation is being performed, while K24 is 
u.sed for clearing the parity-error flip-flop 86. The sum- 65 
or tree 88 OR's all of (he data bits D0-D7 on the associ- 
ated data bus lines of the eight prtx^essing elements 36 of 
the chip. As can be seen, both the parity outputs and the 



With reference now to FIG. 4, and correlating the 
same to FIG. 2, it can be seen that the full adder of the 
invention comprises logic gates 94-100. This full adder 
communicates with the B register comprising flip-flop 
102 which receives the sum bit, the C register which 
comprises flip-flop 104 which receives the carry bit, and 
further communicates with the variable length shift 
register 48 which comprises 16, 8, and 4 bit shift regis- 
ters 106-110, flip-flops 112,114. and multiplexers 
116-120. 

The adder receives an input from the shift register, 
the output of the A register 122. and an input from the 
logic and routing sub-unit the output of the P register 
124. Whenever control line K21 is a logic 1 and BC- 
CLK is clocked, the adder adds the two input bits from 
registers A and P to the carry bit stored in the C register 
104 to form a two-bit sum. The least significant bit of 
the sum is clocked into the B register 102 and the most 
significant bit of the sum is clocked into the C register 
104 so that it becomes the carry bit for the next machine 
cycle. If K21 is at a logic 0, a 0 is substituted for the P 
bit. 

As shown, control line K12 sets the C register 104 to 
the logic 1 state while control line K13 resets the C 
register to the logic 0 state. Control line K16 passes the 
.state of the B register 102 onto the bi-directional data 
bus 58, while control line K22 transfers the output of 
the C regi.ster to the data bus. 
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In operation, the full adder of HG. 4 incorporates a from one of the orthogonally contiguous processing 
carry function expressed as follows: elements 36, or from the data bus 58. Data is received by 

C-AP V PC V AC ^^^^ register 124 from the P register of neighboring 

processing elements 36 by means of the multiplexer 126 

The new state of the carry register C. flip-flop 104, is ' TJ^.^fTl""^ M*^^" ^? transferring 
equivalenttothestatesoftLA%ndP;egis\ersANDed t^'U^, l^l^^^^^ 'hI?!"'?'"/^^^' 
together, or the states of the P and C registers ANDed '^f' ^L'^ ^"^^^ gates 130,132. 

toiether. or the states of the A and C regsten ANdS ^'J^i ^ ^ f T?' l^^T^ 
together. This carry function is achieved, notwithstand- fn^t h ^ complement of the data 

ing the fact that there is no feedback of C register out- { ^ ^."P^^ respectively of the flip-flop 124, 

puts to C register inputs, because the JK flip-flop 104 T^^ ^'^^'^ f^'^^^ ^'^^^^ P'^LK. 

follows the rule: noted, the true and complement outputs of the P 

flip-flop 124 are also adapted to be passed to the P flip- 

C--JC V KC. flops of neighboring processing elements 36. The com- 

15 plement is passed off of the chip containing the immcdi- 
The new state of the C register is the complement of the ^te processing element, but is inverted by a driver at the 
present state of the C register ANDed with the J input destination to supply the true state of the P flip-flop, 
or the complement of the K input ANDed with the The true state is not inverted and is applied to neighbor- 
present state of the C register. Accordingly, in the cir- >"g processing elements on the same chip. The logic 
cuit of FIG. 4, the flip-flop 104 follows the rule: circuitry 40 is shown in more detail in FIG. 4 to be 

under control of control lines K8-K11. This logic re- 

C^APC V (AvP)C. ceives data from the data bus 58 either in the true state 

or complementary through the inverter 130. The logic 
The expression immediately above is equivalent to the network 40, under control of the control signals 
carry function flrst given. 25 K8-K11, is then capable of performing all sixteen Bool- 

With respect to the sum expression, the B register, ean logic functions which may be performed between 
flip-flop 102, receives a sum bit which is an exclusive the data from the data bus and that maintained in the P 
OR function of the states of the A, P, and C registers register 124. The result is then stored in the P register 
according to the expression: 124. 

R^Ampmr^ ^ Understood that with K7=0, gates 130,132 

B-A©p®c. disabled. Control lines K8 and K9 then allow either 

gat« 100 exclusive OR s that result with C to achieve dently. control lines KIO and Kll allow 0, 1. D or S to 

U vil'"'^?"" r u u • , u 35 be sent to the K input. Following the rule of J-K flip- 

inIiieil'l6^'mltT^Z^ T ''TT "''P °P*^«'«"' '"e new state of the P register is de£ 
mg element 36 has 30 stages. These stages allow for the as follows: 

shift registers to have varying lengths so as to accom- 
modate various word sizes, substantially reducing the p*-jp v Rp. 
time for arithmetic operations in serial-by-bit calcula- 40 

tions, such as occur in multiplication. Control lines As can be seen, in selecting all four states of J and all 

K1-K4 control multiplexers 116-120 so that certain four states of K, all sixteen logic functions of P and D 

parts of the shift register may be bypassed, causing the can be obtained. 

length of the shift register to be selectively set at either As discussed above, the output of the P register may 
2. 6. 10, 14. 18, 22, 26, or 30 stages. Data bits are entered 45 be used in the arithmetic calculations of the processing 

into the shift register through the B register 102, these elements 36, or may be passed to the data bus 58. If K21 

being the sum bits from the adder. The data bits leave is at a logic 1, the current state of the P register is en- 

the shift register through the A register 122 and recircu- abled to the adder logic. If K 14 is a logic 0, the output 

late back through the adder. The A and B registers add of the P register is enabled to the data bus. If KIS is at 

twostagesofdelay to the round-trip path. Accordingly, 50 a logic 0, the output of the P register is exclusively 

the round-tnp length of an arithmetic process is either 4, OR'ed with the complement of the G register 132, and 

8. 12. 16. 20, 24, 28. or 32 stages, depending upon the the result is enabled to the data bus. It will be noted that 

states of the control lines K1-K4 as they regulate the certain transfers to the data bus are achieved via bi- 

multiplexers 112-120. directional transmission gates 134,136, respectively en- 

The shift register outputs data to the A register 122 55 abled by control signals KU and K15. These types of 

which has two other inputs selectable via control lines gates are well known to those skilled in the art. 

K1.K2, and multiplexer 120. One input is a logic 0. This The mask register G, designated by the numeral 132, 

IS used toclear the shift register to an all-zero state. The comprises a simple D-type flip-flop. The G register 

other input is the bi-directional data bus 58. This may be reads the state of the bi-d irectional data bus on the posi- 

used to enter data directly into the adder. 60 tive transition of G-CLK. Control line Ki9 controls the 

The A register 122 is clocked by A-CLK, and the masking of the arithmetic sub-unit clocks (A-CLK 

other thirty stages of the shift register are clocked by SR-CLK. and BC-CLK). When Ki9 equals I. thes^ 

SR-CLK. Smce the last stage of the shift register has a clocks will only be sent to the arithmetic sub-units of 

separate clock, data from the bi-directional data bus 58 those processing elements where G= I. The arithmetic 

or logic 0 may be entered into the adder without dis- 65 sub-units of those processing elements where G=0 will 

turbing data in the shift register. not be clocked and jio register and no sub-units will 

As discussed above, the P register 124 provides an change state. When K19=0, the arithmetic sub-units of 

input to the adder 50 with such input being supplied all processing elements will participate in the operation 
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Control line IC20 control s the masking of the logic 
and routing sub-unit. When K20= 1, the clock P-CLK 
is only sent to the logic and routing sub-units of those 
processing elements where G = 1. The logic and routing 
sub-units of those processing elements where G =0 will 5 
not be clocked and their P registers will not change 
state. 

Tr anslation operations are masked when control line 
K20=L In those processing elements where G= I, the 
P register is clocked by P-CLK and receives the state of 10 
its neighbor. In those where G=0, the P register is not 
clocked and does not change state. Regardless of 
whether G=0 or G = 1, each processing element sends 
the state of its P register to its neighbors. 

Brief attention is now given to the equivalence func- 
tion provided for by the inclusive OR gate 138, which 
provides a logic I output when the inputs thereof from 
the P and G registers are of common logic states. In 
other words, the gate 138 provides the output function 
of P0G. This result is then supplied to the data bus. 

The S register comprises a D-type flip-flop 140 with 
the input thereto under control of the multiplexer 142. 
Tlie output from the S register is transmitted to the data 
bus 58 by means of the bi-directional transmission gate 
144. The flip-flop 140 reads the state of its input on the 25 
transition of the clock pulse S-CLK-IN. When control 
line K17 is at a logic 0, the multiplexer 142 receives the 
state of the S register of the processing element im medi- 
ately to the west. In such case, each S-CLK-IN pulse 
will shift the data in the S registers one place to the east. 
To store the stat e of the S register 140 in local memory, 
control line K18 is set to a logic 0 to enable the bi-direc- 
tional transmission gate 144 to pass the complementary 
output of the S register 140 through the inverter 146 
and to the data bus 58. The S register 140 may be loaded 
with a data bit from the local memory 56 by setting K17 
to a logic 1, and thus enabling the data bus 58 to the 
input of the flip-flop 140. 

As mentioned hereinabove, a particular attribute of 



10 



neighboring east chip are disabled. That is, control 
signal K25 may inhibit output gates 148,150 while con- 
currently enabling the bypass gates 152,154. This inter- 
connects S-INO with S-OUT3 and S-IN7 with S-OUT4, 
for all chips in the column. 

In FIG. 5B it can be seen that communications be- 
tween the P registers of east-west neighboring chips 
may also be bypassed. P register data is received from 
the chip to the west via inverters 156,158 and is trans- 
mitted thereto by gates 160,162. Similarly, P register 
data is received from the chip to the east via inverters 
164,166 and is transmitted thereto via gates 168,170. If 
the chip is enabled and P register data is to be routed to 
the west, then control line K6 is set to a logic 1 and K26 
to a logic 0 so gates 160,162 are enabled and gates 
168,170 are disabled. When routing to the east, K6 is set 
to zero and K26 to one. To disable the chip, K6 and K26 
are both set to a logic 0 to disable all P register east-west 
outputs from the chip and K25 is set to allow the bi- 
20 directional bypass gates 172.174 to interconnect 
WEST-0 with EAST-3 and WEST-7 with EAST.4. 
This connects the P registers of PE3 of the west chip 
with PEO of the east chip and PE4 of the west chip with 
PE7 of the east chip. 

By disabling the parity and sum-or trees and by jump- 
ing the inputs and outputs of bordering P and S registers 
of the chips in a column, an entire column of chips may 
be removed from service if a fault is detected. It will be 
understood that while the processing elements of the 
disabled chips do not cease functioning when disabled, 
the outputs thereof are simply removed from effecting 
the system as a whole. Further, it will be appreciated 
that, by removing columns, no action need be taken 
with respect to intercommunication between north and 
south neighbors. Finally, by removing entire chips 
rather than columns of processing elements, the amount 
of bypass gating is greatly reduced. 

In the preferred embodiment of the invention, the 
array unit 12 has 128 rows and 132 columns of process- 



30 



35 



the massively-parallel processor 10 is that the array unit 40 ing elements 36. In other words, there are 64 rows and 



12 is capable of bypassing a set of columns of processing 
elements 36 should an error or fault appear in that set. 
As discussed earlier herein, each chip has two process- 
ing elements 36 in each of four columns of the array unit 
matrix. The instant invention disables columns of chips 45 
and, accordingly, sets of columns of processing ele- 
ments. Fundamentally, the columns are dropped out of 
operation by merely jumping the set of columns by 
interconnecting the inputs and outputs of the east-most 
and west-most processing elements on the chips estab- 50 
lishing the set of columns. The method of inhibiting the 
outputs of the sum-or tree and the parity tree of the 
chips have previously been described. However, it is 
also necessary to bypass the outputs of the P and S 
registers which intercommunicate between the east and 55 
west neighboring chips. 

As shown in FIG. 5A. a chip includes eight process- 
ing elements, PE0-PE7, arranged as earlier described. 



33 columns of chips. Accordingly, there is an extra 
column of chips beyond those necessary for achieving 
the desired square array. This allows for the mainte- 
nance of a square array even when a faulty chip is found 
and a column of chips are to be removed from service. 

Thus it can be seen that the objects of the invention 
have been satisfied by the structure presented herein- 
above. A massively-parallel processor, having a unique 
array unit of a large plurality of interconnected and 
intercommunicating processing elements achieved 
rapid parallel processing. A variable length shift regis- 
ter allows serial-by-bit arithmetic computations in a 
rapid fashion, while reducing system cost. Each pro- 
cessing element is capable of performing all requisite 
mathematical computations and logic functions and is 
further capable of intercommunicating not only with 
neighboring processing elements, but also with its own 
uniquely associated random access memory. Provisions 
are made for removing an entire column of processing 



The S register of each processing element may receive 

data from the S register of the processing element im- 60 chips wherein at least one processing element has been 

mediately to the west and may transfer data to the S found to be faulty. All of this structure leads to a highly 

register of the processing element immediately to the reliable data processor which is capable of handling 

east. When enabled, the chip allows data to flow from large magnitudes of data in rapid fashion, 

S-INO, through the S registers of PE0-PE3 and then While in accordance with the patent statutes, only the 

out of S-OUT3 to the neighboring chip. Similar data 65 best mode and preferred embodiment of the invention 

flow occurs from S-1N7 to S-OUT4. When it is desired has been presented and described in detail, it is to be 

to disable a column of chips, the output gates of the understood that the invention is not limited thereto or 

column of chips which pass the S register data to the thereby. Consequently, for an appreciation of the true 



06/08/2004, EAST version: 1.4.1 



11 



4,314,349 



scope and breadth of the invention, reference should be 
had to the following claims. 
What is claimed is: 

1. A matrix of a plurality of processing elements inter- 
connected with each other and wherein each processing 
element comprises: 

a memory; 
an adder: 

a selectably variable length shift register operatively 
connected to said adder, said shift register compris- 
ing a plurality of individual shift registers having 
gates interposed therebetween, said gates selec- 
tively interconnecting said individual shift regis- 
ters; and 

communication means connected to neighboring pro- 
cessing elements within said matrix and further 
connected to said adder and memory for transfer- 
ring data between said memory, adder, and neigh- 
boring processing elements. 

2. The matrix according to claim 1 wherein each said 
processing element further includes a sum register and a 
carry register operatively connected to said adder. 

3. The matrix according to claim 2 wherein said carry 
register comprises a J-K flip-flop. 

4. The matrix according to claim 1 wherein each said 
processing element includes a logic network capable of 
performing the sixteen logic functions of two bits of 
data, said logic network including a single JK flip-flop. 

5. The matrix according to claim 1 wherein said pro- 
cessing elements are interconnected in groups, the pro- 
cessing elements of each group communicating with 
each other, each group being operatively connected to 
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gales and said exclusive OR gates being mutually exclu- 
sively connected lo a single output. 

8. The matrix according lo claim 7 wherein said ex- 
clusive OR gates of said parity tree are connected lo a 
flip-flop. 

9. The matrix according to claim 7 wherein each said 
group further includes disable means connected lo said 
sum -or tree and parity tree for selectively enabling and 
inhibiting outputs therefrom. 

10. An array of a plurality of processing clement n 
interconnected with each other and wherein each pro- 
cessing element comprises: 

an adder; 

first and second data registers connected with and 
supplying data bits to .said adder; 

a carry register connected to said adder and receiving 
therefrom data bits resulting from arithmetic oper- 
ations and which functions according to the rule: 
C*-APvPCvAC where A is the slate of said first 
register^ P is the state of said second register, and C 
is the state of said carry register; 

a memory; and 

a data bus interconnecting said first, second, and 
carry registers and said memory for the transfer of 
data thereamong. 

11. The array as recited in claim 10 wherein each 
proces-sing element further includes a shift register of 
selectably variable length interconnected between said 
first dau register and said adder. 

12. The array as recited in claim II wherein said 
carry register comprises a J-K flip-flop. 

13. The array as recited in claim 12 wherein each 
processing element further includes a sum register inter- 
connected between said shift register and said adder. 



neighboring groups for communication therewith, and said sum register functioning according to the rule 



B*~A9P©C, where B, A, P. and C are respectively 
the states of said sum, first, second, and carry registers. 

14. The array as recited in claim 10 wherein each said 
processing element includes logic means interconnected 
means fo^ removingTomprise^^^ ^ said second register for performing the sixteen 

logic functions possible between the data of said second 
register and a data bit from .said data bus. 



wherein each group includes means for removing the 
processing elements thereof from communication with 
neighboring groups. 
6. The matrix according to claim 5 wherein said 



interconnecting inputs and outputs of said group. 

7. The matrix according to claim 5 wherein each said 
group includes a sum-or tree of a plurality of OR gates 
receiving data bits from a data bus of each processing 45 
element within said group and a parity tree of a plurality 
of exclusive OR gates receiving data bits from each 
such data bus within said group, the outputs of said OR 



15. The array as recited in claim 10 wherein said 
second register of each .said processing element is com- 
municalingly interconnected with said second register 
of orthogonally neighboring processing elemenis within 
the array. 
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